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INTRODUCTION 



Objecti ve 

The basic objective of computer center management is to provide high 
computer system performance at a reasonable cost under conditions of fluctuat- 
ing workload and fixed computer resources. In order to satisfy this objective, 
it is necessary to forecast the performance and resource utilization which 
would result from a permanent and significant change in workload, if resources 
remain unchanged. If projected performance is unsatisfactory or the antici- 
pated resource utilization is low, the forecast provides a warning that 
resources must be expanded or contracted, respectively. Once the condition 
of saturation or under utilization has been anticipated, it is necessary to 
forecast the performance and resource utilization which would be obtained when 
resources are changed. In addition to providing the above forecasts, the 
requirement to achieve high performance, when constrained by computer capac- 
ity, budget, schedule and workload restraints, suggests the need for a model 
of computer performance and resource allocation. Such a model would provide 
a tool for producing some of the information which computer center management 
requires for achieving high computer performance. Thus, this paper involves 
two major analytical efforts: (1) the development of performance and resource 

usage forecasting equations, and (2) the development of a model for analyzing 
computer performance and resource allocation. 

Scientific computer center management has little control over the 
variables which determine computer center performance. Unlike management data 
processing, the scientific center deals with external users who program a 
variety of problems in a variety of languages and with varying degrees of 



skills. In addition, inputs cannot be scheduled or sequenced to any 
significant degree in order to take advantage of the processing characteris- 
tics of various jobs. Usually, computer management can only influence 
performance by using a charging algorithm which discourages large jobs 
during the prime shift. Tuning of the operating system is done reluctantly, 
if at all, because some computer vendors will not support a modified system 
or will refuse to provide the information needed for making operating system 
changes. Management may be able to make hardware changes or replace the 
computer system when the workload threatens to saturate the system. 

A model for system optimization would have to be compatible with the 
above limitations. In view of the peculiarities of the scientific computing 
environment, the modelling approach which has been developed seeks to use 
charging algorithms and system modifications as the primary system improve- 
ment alternatives and to employ changes to the operating system scheduler 
and dispatcher as a possible secondary remedy. The key control variable which 
has been identified as a means of attempting system optimization is the job 
mix. An explanation of the use of this variable appears in the next section. 

Job Mix 

One variable which has considerable influence over computer performance 
is job mix. This refers to the composition of a group of jobs, by type of 
application, programming language or use of resources, which is input to a 
computer during a specified interval of time. Job mix is of no interest to 
the individual user; his measure of performance is turnaround or response 
time. However, job mix is of interest to computer center management because 
it must view the user workload and allocation of computer resources in total. 

A good job mix, from the standpoint of computer center management, has jobs 
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which are compatible, i.e. jobs which do not compete to a great extent for the 
use of the same resources at the same time and, hence, are non-interfering 
jobs. Equivalently, jobs are compatible if the execution time of each does 
not depend on the resource usage of the other jobs. 

It should be noted that a good job mix, from the standpoint of computer 
center management, may be antithetical to the needs of some users. Therefore, 
when job mix is used as a performance factor, we should also consider its 
effect on user turnaround time. User job production constraints could be used 
in the model as a means of satisfying this requirement. If compatible and 
incompatible job mixes could be identified, this information could be used to 
implement one or more of the following performance improvement techniques: 

• Adjust the job charging algorithm so that incompatible jobs will be 
charged more than compatible jobs during peak computer usage and at the regular 
rate during non-peak periods. This selective charging technique can be applied 
by job type, such as FORTRAN compilations, or on the basis of resource usage, 
independent of job type. An advantage of the first method is that the customer 
would possess definitive information about computer costs in advance of job 
execution because the charge would be based on type of job. A disadvantage 
is that, because customer relations must be considered, it may be infeasible 
for computer center management to selectively charge in such a way that all 
jobs in specified categories are penalized for prime shift usage, regardless 
of resource usage. An advantage of the second method is that it is more 
equitable because pricing is based only on resource usage. However, the user 
may not be aware of resource usage prior to job execution. The resource usage 
would be known, once the job is executed; however, in scientific computing, 
many jobs are executed only once. 



3 



* Eliminate or reduce job mix as a major performance consideration by 
expanding or changing the hardware configuration so that performance is not 
critically dependent on job mix. Of course, the implementation of this 
alternative may involve a considerable expenditure for hardware. 

• Modify the operating system so that compatible jobs will have a higher 
priority for job initiation and execution than incompatible jobs. It should 
be noted that, whereas the job charging approach attempts to influence job 
mix prior to the input of jobs to the computer center, this approach attempts 
to change the input job mix to a compatible mix, after jobs are read into 

the input queue. Whereas the use of a different charging algorithm may pro- 
duce a permanent change in job mix, a change in the operating system will only 
affect job mix during short time intervals, since low priority jobs cannot be 
queued indefinitely. In addition, if the input queue becomes too long, jobs 
must be temporarily transferred to auxiliary storage and later returned to 
main memory, resulting in an increase in processing overhead. Job priority 
determination for scheduling and dispatching purposes may be made by job type 
or resource usage. An advantage of the former method is that priority determin- 
ation is unambiguous because job type will be known to the operating system. 

A disadvantage is that the priority assignment is not selective, i.e. entire 
jobs are assigned a single priority, regardless of resource usage. Assignment 
of job priority by resource usage would be superior to assignment by job type, 
if resource usage were known or could be estimated accurately in advance of 
execution. However, this information is not known for a high percentage of 
jobs until after the first execution. Once the job has been executed, the 
user can request resources based on the resource usage of the last execution. 
However, these requests are often so conservative that there may be a big 
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difference between requested and actual resource usage, resulting in an 
inefficient allocation of resources by the operating system. 

It is natural to speculate on ways to identify the ideal job mix, where 
ideal may be defined as a job mix which will optimize some performance variable, 
such as job elapsed time, subject to user and computer management specified 
constraints. If an ideal mix can be identified, existing performance can be 
evaluated with respect to the ideal. The ideal mix could be used as the goal 
for future performance achievement, and the ideal mix could be used as a 
measure of the usefulness of alternative performance improvement approaches. 

The next section describes the scope of this paper with respect to modelling 
and quantifying the relationships which have been discussed. 

SCOPE 

The scope of this paper is limited to the following three items: 

1. Construct a model for computing the ideal job mix. The ideal job mix 
would be compared with the actual job mix in order to determine which changes 
in computer service charging policy, operating system usage or hardware con- 
figuration may be desirable. Only the mathematical formulation of the model 
is presented in this paper. The numerical solution of the model for a particu- 
lar computer center operation is outside the scope of this paper. In order to 
obtain a numerical solution of the model for an operation with many types of 
jobs, it is necessary to estimate the values of many coefficients and param- 
eters. Work is in progress to complete the estimation of these factors for 

the Naval Weapons Center (NWC), China Lake, California, Computer Center UNIVAC 
★ 

1108. The complete model numerical solution pertaining to this installation 
★ 

The analysis of data in this report pertains to a UNIVAC 1108 which was installed 
at NWC during the period of data collection. This installation was subsequently 
upgraded to a UNIVAC 1110. 
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will be the subject of a subsequent paper. Also to be included in this future 
paper will be a detailed analysis of the problems of implementing a perform- 
ance improvement in terms of a charging algorithm, operating system modifica- 
tion or hardware change, should such an improvement appear necessary after 
obtaining a numerical solution to the model. 

2. Analyze the statistical correlations among performance and resource 
usage variables in order to estimate the degree of association among these 
variables. This information is useful for identifying relationships which 
can be used in computer center resource management and for the development of 
computer center performance and resource usage estimating equations. 

3. Formulate regression equations for forecasting computer center perform- 
ance and for estimating performance coefficients which are used in the model. 
These equations would provide the capability of estimating the effect on per- 
formance of changes in resource usage, job mix or workload. Regression 
equations are also required for estimating resource usage and resource coeffi- 
cients which are used in the model. The formulation of the resource usage 
equations was beyond the scope of the paper. 

Although each of the above items involves a separate analysis, the 
items are related because the regression equations are used to provide 
estimates of model coefficients and the correlation coefficients are used to 
develop the regression analysis. 

Items 2 and 3 involved the analysis of system log data collected from 
the NWC. The sample data used in the analysis is summarized in Appendix I. 
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COMPUTER CENTER RESOURCE ALLOCATION MODEL 



Operating Environment 

The type of operating environment for which this model is applicable 
is batch and terminal processing with multiprogramming. Using the terminology 
of NWC, jobs executed from a terminal will be called demand jobs. The types 
of programs which are executed can be divided roughly into three classes: 
compilations, execution of compiled programs (production) and the use of a 
variety of utility programs. Each of these categories is executed in both 
batch and demand modes and the two modes are used on the same shift. 

Job Mix 

Job mix as a controllable performance variable has been discussed in 
previous sections. On what basis should the ideal mix be chosen? One approach 
is to attempt to find that mix which will simultaneously satisfy job produc- 
tion, computer center budget, resource capacity and utilization constraints 
and minimize total elapsed time over all jobs. Job mix during time periods 
of equal duration (say one hour) can be represented by the number of jobs 
of program type j 

X-| , • • • »Xj > • • • ( 1 ) 

which are executed during the specified time period. Alternately, the mix 
can be represented by 

X-j/Xy, . . . jXj/X-j., . . . » X n /Xy (2) 

where Xy is the total number of jobs executed in the designated time period. 
For our purposes (1) will be the number of jobs executed of a given program 
type (FORTRAN compilations) per hour. 
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Performance Variables 



The achievement of user performance objectives should result in the 
maximization of user satisfaction or utility, overall, taking into considera- 
tion various user and computer center constraints. A model which employs the 
concept of relative value to a user as a function of system response time and 
utilization is described in [1], Ideally, measures of user utility or value 
should be used in a performance model. However, this type of data is very diffi- 
cult to collect. The reference does not explain how this data was or could be 
obtained. Lacking a direct measure of the value to the user of computer system 
performance, a surrogate variable, elapsed time, will be utilized. It will be 
assumed that value to the user is inversely proportional to elapsed time. The 
minimization of this variable over all jobs, is equivalent to maximizing value 
to the users as a whole. For batch and demand jobs elapsed time is defined as 
the interval of time between the initiation and termination of CPU activity on a 
job. Elapsed time involves all CPU and I/O file activity and wait times between 
the time that the CPU starts and terminates job execution. The elapsed time 
excludes the following: time required to read the job from the card reader or 

terminal, time the job resides in internal storage prior to the start of execu- 
tion by the CPU and time required to print job output. Job output is spooled 
to peripheral devices during execution and printing is performed as a separate 
task. For purposes of this model and the analyses which appear in subsequent 
sections of this paper, elapsed time per program type j, T., will be the 

J 

total of all elapsed time for program type j during a one hour period. A 
primary reason for using this particular definition is that this is the form 
in which data is recorded by the system logging function at the Naval Weapons 
Center Computer Center. This system captures total elapsed time and resource 
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usage data by program type for each hour of operation but does not record 

these data by individual job. However, the number of jobs executed per hour 

by program type, x., is recorded so that the mean elapsed time per job, t., 

J J 

may be computed. With respect to program type j, elapsed time 



T. 

J 



= C 



W. 

J 



(3) 



where C. is CPU active time and W. is CPU inactive or wait time. Wait time 
J J 



"j ’ W jl + W j2 + W J3 



(4) 



where W., is the total wait time per hour incurred by program type j while 
J * 

waiting for its own I/O operation to complete and no CPU processing is occurring; 

W^ is the total wait time per hour incurred by program type j while waiting 

for the CPU to become available; and W.~ is the total wait time per hour 

J 'J 

incurred by program type j while waiting for an I/O device to become avail- 
able and no CPU processing is occurring. It should be noted that, in general, 
there is an overlap of CPU and I/O operations; W., is the amount of I/O time 

J * 

of program type j which is not overlapped. 

With respect to the components of elapsed time T., the following 

J 

functional relationships exist: 

• C- is determined by the CPU requirements of program type j which 

J are determined in part by x. and job core requirements. 

J 

• W., is determined by the I/O requirements of program type j which 

J are determined in part by x-, C. and job core requirements. 

J J 

Job mix has no effect on Cj and W^ . 

• W^ is determined by the CPU requirements of other program types. 

• Wjg is determined by the I/O requirements of other program types. 
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Thus job mix can be used as a 



Job mix does have an effect on and W^. 

control variable to control the component of elapsed time. The 

NWC data recording includes T. and C., from which W. can be computed; 

J J J 

however, , W^, and are not individually recorded, so that control 

of Wj 2 + ^j 3 has t° be exercised indirectly via VK. 

Other candidates for performance variable are CPU time and total wait 
time. Neither of these variables alone accounts for both CPU activity and 
the delays resulting from resource usage conflicts created by the presence 
of multiple jobs or the delays caused by non-overlapped CPU and I/O activities. 



Optimal Job Mix 

The optimal job mix is defined to be that mix which will result in the 
minimization of elapsed time over all jobs executed in a specified time period. 
If performance and resource usage vary considerably over a number of time 
periods, it would be necessary to identify an optimum job mix for each time 
period. However, this consideration is not a critical issue with respect to 
the approach used to develop the model. It is a secondary issue concerned 
with estimating model coefficients for different time periods and employing 
them in the appropriate time period solution. If the set 



x 



* 

r* 





( 5 ) 



constitutes the optimal mix, given in units of jobs per hour for program type 
j, then the ratio 

M = Y t -x . / 7 t .x . (6) 

j 4 J J 5 J J 

is a figure of merit, where t- is the estimated mean elapsed time in 

J 

seconds per job for program type j, and t. and x, are the actual 

J J 
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elapsed times and number of jobs, respectively. The closer M is to one, when 



various constraints are considered , the better the performance. The goal of 
various performance improvement approaches, such as charging algorithms, operating 
system modification and hardware changes, would be evaluated with respect to their 
ability to achieve the optimal job mix given by (5). As mentioned previously, 
there may be a different optimal mix for each time period. It should be noted 
that it may be possible to have more than one optimal job mix. In addition, total 
elapsed time may be insensitive to job mix. It will be possible to analyze these 
factors, once numerical solutions to the mathematical program are available. 

MATHEMATICAL PROGRAM 



Objective Function 

The use of mathematical programming for computer equipment selection and 
computer center resource allocation has been demonstrated in [2]. In order to 
estimate the optimal job mix, a mathematical program is formulated based on the 
minimization of T. and the satisfaction of constraints involving the x. and 

J J 

resource usage A.., where i and j are the resource and program type, 

* J 

respectively. For the purpose of generality of notation, the form A., will be 

' \J 

used to designate all resource usages, including CPU time C.. Recalling (3) and 

J 

(4) and the statements which followed concerning the functional relationships 
involving C. and W., we can write 

J J 

T j = f j ( x i » - * * » x j » • * - » x n ’ A i i > • • • > A ii > • • • > A m i ; 



A 1 j * • • • ,A i j * • • • ,A mj ;A 1 n ’ • • • ,A i n ’ • " ,A mn ^ 



(7) 



The A., are a function of the use of other resources A. . and the 

1 J KJ 

x.. An example is the number of I/O references per hour as a function of 

J 

CPU time per hour (a measure of program duration), core usage per hour (a 
measure of program size) and number of jobs per hour (a measure of resource 
usage). Thus, 
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A ij ■ whe| - e k ^j- 



(8) 



Certain A. . , such as core usage, are a function of x. directly. 

* J J 

In other cases the A., can be made a function of x. indirectly via the 

• J J 

relationship between Ay and Xj. An example is that CPU time per hour 
(A..) is a function of core usage per hour (A, .). Both are a function of 

I J KJ 

number of jobs per hour (x.), so that CPU time can be related to jobs per 

J 

hour alone. Thus (8) becomes 

A. . = g .(x . ) . (9) 

T J J J 

Although the degree of association between dependent and independent 
variables may be higher in (8) than in (9), the use of (9) is required because 
the Ay are not really independent variables; their values depend upon the 
nature of the computer operation to be performed. In addition, the Ay are 
not known in advance of a computer run; they must be estimated from the x.. 

J 

In contrast, the x. are determined external to the computer system by the 

J 

users. Their values and the associated job mix represent the demands placed 
on the computer center. In view of (9) equation (7) reduces to 

Tj ~ f j (x*|,...,Xj,...,x^). (10) 

A discussion of the degree of linear association among the variables 
in (9) and (10) and the appropriateness of a linear model is presented in the 
correlation and regression sections of this paper. For the present we will be 
interested in (9) and (10) for the purpose of formulating a mathematical pro- 
gramming model. 

Since we wish to minimize the elapsed time per hour over all program 
types, the objective function in the mathematical programming formulation 
becomes 
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( 11 ) 



Minimize Z = l T\ - l f •(x 1 . ,Xj,. .. ,x n ) 

J <3 

Constrai nts 

1 . Resource Usage 

Total resource usage must not exceed the available resource in a 
specified time period. Thus, using (9) 



A ij = l 9j( x j) ^ b i * i = 1 , . . . ,m 
0 

where is the capacity of the i— resource, such as the number of CPU 
seconds available per hour. Since in certain instances, it may be infeasible 
or of no interest to account for all types of programs which are executed at 
a computer center, a load factor , representing the fraction of the total 
capacity b^ which is used by the n program types (excluding operating 
system load), is applied as follows: 



i 

J 




1 1 y • • • jIT) • 



( 12 ) 



2. Production 

User requirements in terms of number of jobs of program type j 
executed per hour, N., must be satisfied 

J 



x.^N., j = 1 , . . . ,n and N . ^ 0. 
J J J 



(13) 



The production constraints may be viewed as an indirect way of specifying 
priority by program type in the model. 

3. The computer center should receive revenue per hour at least equal to 
the total budget available per hour, B, to operate the computer center. Thus 



l a. .p.xj * B 

where a., is the units of resource i per job used by program type j 

* vJ 

p.j is the price per unit of resource i which is charged to the users. 



I 

j 



(14) 

and 
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4. Utilization 



The computer center may be interested in maintaining a minimum CPU 

utilization U. for each program type. If this is the case, a constraint 
J A 1 ‘ 

would be involved of the form -y 1 ^ £ U^ or U^Tj-A^ £ 0, where A^ is 

J 

the CPU resource usage. Since A^ and Tj can be expressed in terms of 
x., we have 

J 



U j f j ( x j ) ' h j( x j) ^ °» j = l,...,n. 



(15) 



Thus, (11) through (15) constitute a mathematical program, where the 
solution variables are the x- and we wish to find the optimum job mix 

J 

"k 'k ic 

Xj,...,Xj,...,x n . This formulation does not depend upon the existence of 
linear functions in (11) through (15), although the feasibility of obtaining 
numerical solutions depends upon the existence of linear or separable (non- 
linear but no product of x. terms) functions. Assuming for the present 

J 

that a linear function can be used to estimate T i with sufficient accuracy, 

J 

(10) can be written as 



T j = V j0 + v jl x l + v J 2 x 2 + 



.. + v jk x k + ... * v Jn x n 



(16) 



where the v^ are estimated regression coefficients and 1 ^ j £ n. When 

/s 

(16) is divided by x., we obtain t., the estimated elapsed time per job 

J J 

for program type j. Using (16) in the objective function (11), we obtain 



Minimize Z = J J v jkV 
j k 

★ A 
Once the x. are determined, they can be used with the t. to obtain the 
J J 

figure of merit of (6). 

Similarly, if we assume that (9) can be estimated with sufficient 
accuracy by 



14 



where a., and 8- are estimated regression coefficients, (12) can be 
^ J J 

expressed by 



B- -x . 
iJ J 



i.b. 

i l 



- I 

j 



a. 



1J 



i = 1 , 



,m. 



( 19 ) 



A 

If (18) is divided by x., we obtain a,., the estimated resource usage per 

J * J 

job for program type j. The a.- are used as estimates of a., in (14). 

I J ' vJ 

Applying (16) and (18), (15) becomes 



U 



j l v jk x k 



B u x j £a ij 



' Yjo’ 



j 1 , . . . ,n, 



( 20 ) 



A A 

where L). is the sample mean utilization for program type j and the a,. 

J * J 

and B-i • constitute the estimated regression coefficients of (18) for CPU 
* J 

usage. 

Thus, (13), (14) with a., substituted for a.., (17), (19) and (20) 

^ J * J 

constitute a linear program. 

Fortunately, it is not necessary to obtain an integer solution for the 
x. because, as actually measured, the number of jobs per hour of program 
type j is, in general, a non-integer value, because some jobs are in process 
at the termination of each one hour measurement period. A fraction of each 
job in process is recorded against the period just concluded and the remaining 
fraction is recorded against the next period. 

After obtaining the optimal solution (5), the quantities listed below 
could be computed. These calculations would indicate how close the actual 
computer center operation is to the optimal operation. Of course, lack of 
optimality would only exist in the context of the mathematical definition of 
the linear program. Computer center management may wish to use different 
sets of constraints and definitions of optimality. 



(1) Difference Between Total Actual and Total Optimal Elapsed Time 



Vj - 1 



( 21 ) 



(2) Difference Between Actual and Optimal Production 



x • - x., j = 1 ,. . • ,n 

J J 



( 22 ) 



(3) Difference Between Actual and Optimal Revenue 



I I A ijPl - l l a jjPi x* 

J 1 J J 1 J J 



(23) 



(4) Difference Between Actual and Optimal Resource Usage 



/N ★ 



y A. . - J (a. .+B . -x . ) j 
i 1J i v 1J 1J J 

J J 



i = 1 , . . . ,m 



(24) 



(5) Difference Between Actual and Optimal CPU Utilization 



U j ' V f j = U j ' (o ‘lj + »lj x j )/(v jO + [ v jk x k>- j = 1 n 



A ^ ^ ★ 



(25) 



If a significant difference between actual and optimal operations 
exists, a sensitivity analysis could be performed with respect to the param- 
eters: computer capacities (b. ), user production requirements (N.), budget 

• J 

(B), charge rates (p.)> and utilizations (U-), in order to ascertain the 

J 

effect of parameter changes on total elapsed time and job mix. 

Solutions Obtained by Distinguishing Among Hourly Operating Periods 

In order to achieve sufficient accuracy in the estimates of linear 
program coefficients, as obtained from regression equations, it may be neces- 
sary to make the estimates for each hour of the daily operation. This corre- 
sponds to using twenty-four sets of observations, with each set corresponding 
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to all observations in the given one hour period over all the days which 
comprise the sample. This is in contrast to the above formulation which makes 
no distinction of observations according to hour of the day. 

The linear program would be changed by modifying (17), (13), (14), 

(19), (20) and (5), as follows, respectively 

I 

j 

X jh ^ N jh’ j = l,...,n; h = 1,...,24 and N^ h £ 0, (27) 



Minimize Z = £ 



v jkh x kh’ 



(26) 



^ | l a ijh P ih x jh * 24B ’ 



(28) 



I l B ijh x jh * 24 Vi " l l a ijh’ 1 " 

h j J J h j J 



(29) 



A A " x> 

U jh Vjkh x kh " 6 ljh ^ a ijh " U jh v j0h, j = 1 ,. .. ,n; h = l,...,24. (30) 



where h refers to the hour of the day. The optimal solution becomes 



* 



X 
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* ★ * * 
,X lh’ • • • » x i24’ x jl’---’ x jh 



★ ★ * ★ 

’ x j24 ; x nl ‘ ,x nh’ ‘ * ' ,x n24‘ ^ 31 ' 



This approach would obviously require significantly more computation 
and definition of parameters but would provide greater solution accuracy and 
allow a distinction to be made, by hour, with respect to production require- 
ments (Nj^) and resource prices (P-,^)* 



CORRELATION ANALYSIS 



The degree of linear association between performance and resource 
usage variables and among resource usage variables can be measured by simple 
and multiple correlation coefficients. Although no cause and effect relation- 
ships can be attributed to the existence of high correlation coefficients, 
the measures do have several uses. One use is to indicate which variables 
should be related in regression equations for the purpose of using linear 
functions to forecast performance or resource usage. Secondly, since this 
paper is concerned with mul ti programmed systems, it is of interest to identify 
job mixes which contain types of programs with highly correlated variables. 

This information identifies the job mixes for which linear functions can be 
used to forecast performance and resource usage, when the mix is executed in 
a multi programmed system. Similarly, job mixes which contain variables with 
low correlations cannot be analyzed with linear functions. Finally, the corre- 
lation coefficients indicate whether the performance of a given program type 
can be forecasted with linear functions by using only its variables, or whether 
it is necessary to use other program variables. This determination is made 
by comparing correlation coefficients within program types to correlation 
coefficients between program types. The existence of job mixes with high 
correlations or low correlations suggests but does not prove the existence of 
non-compatible and combatible mixes, respectively. 

The correlation analysis is related to the mathematical program of the 
previous section because it provides a basis for developing the regression 
equations which are used to obtain estimates of the mathematical program 
coefficients. The regression analysis is described in the next section. Only 
relationships which are linear in the independent variables are discussed in 
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this section. The applicability of polynomial functions is considered in the 
regression analysis section. Initially, it was of interest to examine the 
applicability of linear functions because: (1) if this approach proved adequate, 

the functions could be used in the linear program, (2) linear functions may 
suffice as an adequate approximation to the true performance and resource usage, 
and (3) greater forcasting accuracy may be attainable by using many independent 
variables in a linear regression equation as compared to using fewer variables 
in a polynomical regression equation. Functions which are non-linear in the 
parameters were not used because a significant number of data values were zero, 
since all program types are not executed during each time period. The absence 
of these values must be recorded as zeros when performing regression analysis 
with respect to the many programs which constitute the job mix in a multipro- 
gramming mode. In order to perform a logarithmic transformation of the variables, 
when zero values are present, it would be necessary to use the form log (x+a), 
where "a" is a constant. This is not a problem when program types are analyzed 
individually because it is not necessary to account for missing values. The 
focus of the investigation was on the development of a multiple variable model 
and the correlation and regression analysis of multiple program types in order 
to treat the variables as they exist in the actual multiprogramming environment. 
However, if it can be shown that only a limited number of variables is required 
to adequately forecast performance and resource usage, the complexity and compu- 
tational effort involved in solving the mathematical program and regression 
equations can be reduced. 

A description of the sample which was used in the correlation and 
regression analysis appears in Appendix I. A definition and identification 
of variables appears in Appendix II. A brief description of the NWC computer 
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configuration and the types of programs which were used in the analysis appear 
in Appendix IV. 

The data used in the correlation and regression analysis were produced 
by the system logging function at the NWC Computer Center. 

Sample correlation coefficients were computed by using the computer 
program "Correlation with Transgeneration, BMD02D" [9]. Prior to the use of 
the correlation routines, scatter plots were obtained for identifying the 
high correlation variables. 

The results which will be discussed are only a sample of the entire 
analysis which was performed. Correlation coefficients were obtained for 
six types of programs, both batch and demand. The data shown in the tables 
is intended to illustrate the type of analysis which has been performed and 
to indicate the applications of the data. The data shown is typical of the 
pattern of sample correlation coefficients. 

Correlation of Variables Within a Program Type 

The data presented in Tables 1 and 2 are typical of the results 
obtained for the twelve program types which were analyzed. That is, sample 
correlations are high except for the correlations between elapsed time and 
other variables, where the values are considered acceptable but not outstand- 
ing. This anomaly is attributed to the high percentage of wait time which 
is part of elapsed time (refer to Tables 13 and 14). As mentioned previously, 
wait time is a function of several variables. Some of these variables are 
associated with other program activities (wait for another program's CPU or 
I/O activities to complete). Thus, we would not expect to find a high corre- 
lation coefficient when only the resource usages of the given program type 
are related to elapsed time. None of the relationships between performance 
and resource usage or between resource usages should necessarily be linear. 



20 



Table 1 

Sample Correlation Coefficient Matrix - FORTRAN (Batch Programs) 

N = 108 hours 



v^ORTRAN 

FORTRAN^ 


Jobs 


CPU 

Time 


Core 


I/O Refs 


I/O Words 


El apsed 
Time 


Jobs 


1.0 


.962 


1 .0 


.991 


.996 


.709 


CPU Time 


.962 


1.0 


.962 


.977 


.970 


.707 


Core 


1.0 


.962 


1.0 


.992 


.996 


.709 


I/O Refs 


.991 


.977 


.992 


1.0 


.997 


.716 


I/O Words 


.996 


.970 


.996 


.997 


1 .0 


.710 


Elapsed 

Time 


.709 


.707 


.709 


.716 


.710 


1 .0 



Table 2 

Sample Correlation Coefficient Matrix - FORTRAN (Demand Programs) 

N = 108 hours 



^FORTRAN 

fortra?T\ 


Jobs 


CPU 

Time 


Core 


1/0 Refs 


I/O Words 


Elapsed 

Time 


Jobs 


1.0 


.816 


1.0 


.955 


.979 


.732 


CPU Time 


.816 


1 .0 


.817 


.941 


.908 


.650 


Core 


1 .0 


.817 


1.0 


.955 


.979 


.730 


I/O Refs 


.955 


.941 


.955 


1.0 


.993 


.713 


I/O Words 


.979 


.908 


.979 


.993 


1.0 


.745 


Elapsed 

Time 


.732 


.650 


.730 


.713 


.745 


1.0 



? l 



For example, CPU time per hour should be positively correlated with the amount 

★ 

of core used per hour and number of jobs executed per hour. However, the jobs 
of a given program type do not use equal amounts of CPU time, nor do programs 
of equal core size use equal amounts of CPU time. Thus, all correlation coeffi- 
cients are intended only to provide an indication of the feasibility of using 
a linear function as an approximation to the true function. 

In contrast to elapsed time, resource usage, as measured by CPU time, 
core usage, I/O references and I/O words transferred, is a function only of 
the variables of the given program type. Thus data of the type depicted in 
Table 1 and Table 2 are the only data necessary for indicating the degree of 
linear association among resource usage variables. For elapsed time, the 
correlations of elapsed with other program variables must also be examined. 

This topic is discussed in the next section. 

Correlation of Variables Across Program Types 

The data in Table 3 and Table 5 suggest that there is a greater degree 
of linear association between the elapsed time of a given program type (FORTRAN) 
and its variables than there is between elapsed time and other program variables. 
This is true for both batch (Table 3) and demand (Table 5) programs. In addi- 
tion, there is greater correlation between batch elapsed time and batch pro- 
gram variables than between batch elapsed time and demand program variables 
(Table 3). Similarly, there is greater correlation between demand elapsed time 
and demand program variables than between demand elapsed time and batch program 
variables (Table 5). Thus, the primary (high correlation) variables for an 
elapsed time linear regression equation would come from the given program type. 

★ 

It should be noted that certain of the program types analyzed, including 
FORTRAN compilations, use a constant amount of core storage (See Appendix I). 
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Table 3 



Sample Correlation Coefficients of FORTRAN 
with Resource Usage of Various 



(Batch) Elapsed Time 
Programs 



N = 108 hours 

Resource Usage 



CPU 



Batch Programs 


Jobs 


Time 


Core 


I/O Refs 


I/O Words 


FORTRAN 


.709 


.707 


.709 


.716 


.710 


Other 


.519 


.358 


.454 


-.141 


-.114 


FURPUR 


.368 


.259 


.365 


.175 


.153 


MAP 


.632 


.570 


.622 


.601 


.619 


PMD 


.069 


.132 


.083 


.092 


.116 


COBOL 


.227 


.111 


.226 


.076 


.086 


Demand Programs 












FORTRAN 


.400 


.379 


.399 


.402 


.411 


Other 


.449 


.402 


.438 


.339 


.337 


FURPUR 


.486 


.456 


.486 


.430 


.430 


MAP 


.425 


.404 


.424 


.399 


.423 


PMD 


.177 


.064 


.170 


.059 


.063 


COBOL 


.212 


.273 


.212 


.258 


.261 



Table 4 

Sample Correlation Coefficients of FORTRAN 
with Resource Usage of Various 

N = 108 hours 

Resource 

CPU 

Batch Programs Jobs Time Core 


(Batch) Wait 
Programs 

Usage 
I/O Refs 


Time 

I/O Words 


FORTRAN 


.653 


.646 


.653 


.659 


.653 


Other 


.475 


.346 


.417 


-.141 


-.110 


FURPUR 


.339 


.239 


.337 


.158 


.137 


MAP 


.586 


.523 


.576 


.556 


.573 


PMD 


.048 


.103 


.060 


.068 


.091 


COBOL 


.209 


.103 


.208 


.069 


.078 


Demand Program 
FORTRAN 


.385 


.364 


.384 


.386 


.395 
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Table 5 



Sample Correlation Coefficients of FORTRAN 
with Resource Usage of Various 



(Demand) Elapsed Time 
Programs 



N = 108 hours 



Resource Usage 



CPU 



Batch Programs 


Jobs 


Time 


Core 


I/O Refs 


I/O Words 


FORTRAN 


.218 


.232 


.218 


.224 


.222 


Other 


.355 


.214 


.273 


-.212 


-.111 


FURPUR 


.229 


.183 


.229 


.106 


.226 


MAP 


.405 


.304 


.403 


.323 


.259 


PMD 


-.018 


.179 


-.017 


.175 


.167 


COBOL 


.278 


.136 


.276 


.074 


.092 


Demand Programs 












FORTRAN 


.732 


.650 


.730 


.713 


.745 


Other 


.614 


.623 


.562 


.553 


.487 


FURPUR 


.595 


.507 


.596 


.510 


.583 


MAP 


.606 


.579 


.606 


.565 


.580 


PMD 


.041 


.043 


.048 


.037 


.040 


COBOL 


.404 


.373 


.403 


.378 


.388 



Tabl 

Sample Correlation Coefficients 
with Resource Usage 

N = 108 
CPU 

Batch Programs Jobs Time 


e 6 

of FORTRAN (Demand) 
of Various Programs 

hours 

Resource Usage 
Core I/O Refs 


Wait Time 
I/O Words 


FORTRAN 


.211 


.225 


.211 


.217 


.214 


Other 


.349 


.213 


.267 


1 

ro 

o 


-.109 


FURPUR 


.226 


.183 


.227 


.107 


.228 


MAP 


.398 


.297 


.396 


.316 


.351 


PMD 


-.017 


.176 


-.017 


.175 


.166 


COBOL 


.275 


.134 


.273 


.072 


.090 


Demand Program 
FORTRAN 


.721 


.632 


.720 


.699 


.732 
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However, in order to obtain sufficient regression equation forecasting accuracy, 
it may be necessary to include secondary (lower correlation) variables from 
other program types. The effects of the secondary variables are an indication 
of the impact of job mix and conflicting resource usage on performance. 

Similar results were obtained with respect to the use of wait time as 
a performance variable, as shown in Table 4 for batch FORTRAN compilation and 
in Table 6 for demand FORTRAN compilation. The indication is that elapsed 
time will provide a more accurate predictor of performance than wait time when 
using a linear regression function. This is due to the fact that wait time 
varies in a complex manner because of the influence of other program activities 
under multiprogramming, whereas elapsed time contains CPU time which will later 
be seen to vary more linearly than wait time with resource usage. 

An examination of Tables 3, 4, 5 and 6 indicates that in most cases, for 
FORTRAN compilations, there appears to be no appreciable difference in correlations 
by resource usage for the batch and demand program categories. This examina- 
tion involves comparing the correlation coefficients across columns within the 
batch and demand groups. Also, there appears to be no significant difference 
among resource usage correlations per given program type (examine the data 
within rows). As a consequence of the absence of difference in correlation 
by variable, we are led to consider the use of the given program's number of 
jobs as the primary variable for estimating elapsed time and the use of other 
programs' number of jobs as the secondary variables. The primary variable is 
related to the CPU time and wait time attributed to the execution of the given 
program and the secondary variables are related to the wait time attributed 
to the execution of other programs. Other reasons for using number of jobs to 
forecast elapsed time are: 



25 



(1) it is not necessary to estimate this variable from other variables, and 

(2) these are the solution variables (x.) of the linear program. 

J 

Certain correlations, which may appear to be significant, require 
careful interpretation. An example is the correlation between FORTRAN 
compilation elapsed time and MAP CPU time (Table 3). This value (.570) may 
appear to be related to the delay in FORTRAN compilation caused by MAP use of 
the CPU, whereas the more likely explanation is the correlation between the 
two independent variables (.795 between batch FORTRAN number of jobs and batch 
MAP number of jobs). This coefficient does not appear in Table 3. Thus, a 
large amount of FORTRAN elapsed time occurs during the same hourly interval as 
a large amount of MAP CPU time because a large number of FORTRAN and MAP jobs 
are executed during the same period. A possible method ‘for removing the effect 
of number of jobs would be to normalize the variables to be correlated by 
dividing the values by number of jobs. However, this is infeasible because 
many number of job values are zero. If values are restricted to non-zero 
quantities, the intersection of the non-zero values of two or more program 
types is a sample size which is too small for meaningful statistical analysis. 

Correlation of Elapsed Time with Number of Jobs 

For reasons which have been given previously it is highly desirable to 
use only number of jobs as the independent variable in the regression equa- 
tions. The data in Table 7 provide one indication of the feasibility of this 
approach. The following characteristics emerge: 



26 



Sample Correlation Coefficient Matrix - Elapsed Time vs. Number of Jobs 

N = 108 hours 
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Signifies highest correlation other than with own jobs. 
Signifies lowest correlation other than with own jobs. 



• Correlations of elapsed time with the given program's number of jobs are 
generally higher than with other programs' number of jobs. 

• Batch programs have higher correlations with batch number of jobs than with 
demand number of jobs. Similarly, demand programs have higher correlations 
with demand number of jobs than with batch number of jobs. 

• As was mentioned in the previous section, the given program will be the 
source of the primary variable in an elapsed time regression equation and 
the other programs will be the source of the secondary variables. 



Correlation of Elapsed Time with CPU Time 

A comparison of Tables 7 and 8 reveals that in many instances the corre- 
lation of elapsed time with CPU time is higher than with number of jobs. For 
this reason, it may be possible to achieve greater accuracy in the forecasting 
of elapsed time for some program types by using CPU time as the "independent" 
variable. Since CPU time is one of the variables to be forecasted, its value 
must be estimated from number of jobs or from this variable and the amount of 
core storage used. The correlations of CPU time with these variables are 
shown in Table 9. 

Correlation of Resource Usage Variables 
A. CPU Time 

The data in Tables 9 through 12 are presented in order to show the 
relationships among variables which are required in order to forecast four 
resource usage variables: CPU time, core storage usage, number of I/O refer- 

ences and number of I/O words transferred. In the discussion which follows 
all references to variables are with respect to a given program type. It is 
hypothesized that CPU time per hour is primarily a function of two variables: 
number of jobs per hour and amount of core used per hour. The former is a 
measure of the frequency of use of the CPU and the latter is a measure of the 
duration of CPU use as indicated by program size. As mentioned previously. 
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Table 8 



Sample Correlation Coefficients of Program Elapsed Time 
with that Program's CPU Time 





N = 108 hours 
Batch 


Demand 




Programs 


Programs 


FORTRAN 


.707 


.650 


Other 


.714 


.849 


FURPUR 


.738 


.707 


MAP 


.808 


.785 


PMD 


.718 


.576 


COBOL 


.803 


.809 







Table 9 






Sample Correlation 


Coefficients of Program CPU Time 


with 1 


Program' 


1 s Number 


of Jobs and 


Core Usage 






N = 


108 hours 








Batch Programs 


Demand Programs 




Jobs 


Core 


Jobs 


Core 


FORTRAN 


.962 


.962 


.816 


.817 


Other 


.519 


.459 


.832 


.829 


FURPUR 


.711 


.714 


.753 


.758 


MAP 


.905 


.916 


.963 


.963 


PMD 


* 

.423 


.472 


* 

.227 


.242 


COBOL 


.753 


.755 


.886 


.887 



★ 

Small number of jobs. 



Table 10 

Sample Correlation Coefficients of Program Core Usage 
with that Program's Number of Jobs* 

Batch Demand 

Programs Programs 

Other .937 .949 

All other programs use approximately a constant amount of storage 
per job. Therefore, correlation coefficients «1. 
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this is not a linear function. However, a linear function appears to be a 
satisfactory approximation to the true function. For certain program types, 
such as compilations, the amount of core storage is a constant. In this case 
the known values of core storage and number of jobs can be used directly in 
the regression equation. In the case of production programs, the core storage 
usage is a variable and must be estimated from number of jobs. The correla- 
tions of CPU time with number of jobs and core storage usage are shown in 
Table 9. 

B. Core Storage Usage 

The correlations of production program core storage usage (OTHER) with 
number of jobs are shown in Table 10. Since there is a great variety of pro- 
duction jobs at NWC with a variety of core storage requirements, the high 
linearity between core storage usage and number of jobs is surprising. This 
is possibly explained by the allocation of core storage to a program in fixed 
size blocks, in which the number of region sizes is limited. This procedure 
reduces the core storage size variability. 

C. I/O References 

It is hypothesized that number of I/O references per hour is primarily 
a function of CPU time per hour and core storage used per hour. The former is 
the time during which it is possible to initiate an I/O reference. The latter 
is a measure of frequency of occurrence of I/O instructions as indicated by 
program size. Again, this function would not be strictly linear because I/O 
references do not occur in strict proportion to program duration and I/O com- 
mands do not appear in programs in strict proportion to program size. The 
correlations between I/O references and CPU time are shown in Table 11. Corre- 
lations between I/O references and core storage usage were not consistently high. 
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Table 11 



Sample Correlation Coefficients of Program I/O References 
with that Program's CPU Time 

N = 108 hours 

Batch Demand 

Programs Programs 



FORTRAN 


.977 


.941 


Other 


.186 


.531 


FURPUR 


.745 


.805 


MAP 


.959 


.997 


PMD 


.921 


.987 


COBOL 


.969 


.994 



Sample Correlation 


Table 12 

Coefficients of Program 


I/O Words 


Transferred 


with that Program's CPU 


Time 




N = 108 hours 
Batch 


Demand 




Programs 


Programs 


FORTRAN 


.970 


.908 


Other 


-.086 


.510 


FURPUR 


.753 


.848 


MAP 


.966 


.991 


PMD 


.943 


.980 


COBOL 


.969 


.993 
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The CPU time which would be used in a regression equation for forecasting 
I/O references would be estimated from number of jobs and amount of core 
storage required. 

The production program (OTHER) correlations in Table 11 are low. A 
relatively low correlation coefficient for production batch jobs also appears 
in Table 9. These low values appear to be caused by the heterogeneous 
characteristics of production jobs as contrasted to the more uniform charac- 
teristics of a compiler or utility program. 

D. I/O Words Transferred 

Since words are transferred each time an I/O reference is made, the 
two variables would be positively correlated. In fact, the sample correla- 
tion coefficients are high except for the OTHER category. These coefficients 
are not shown in the tables. Since I/O references would be forecasted by 
using CPU time, as indicated above, the correlations between CPU time and 
I/O words transferred are shown in Table 12. 

Distribution of Resource Usage 

Tables 13 and 14 show mean values of CPU time, wait time, elapsed time 
and CPU utilization on a per hour and job basis, respectively. These tables 
demonstrate that the "other" category (production programs) dominate computer 
usage. This is unfortunate because it was not possible to obtain high corre- 
lation coefficients for this category due to the heterogeneity of production 
programs . 
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Table 13 
N = 108 hours 



Mean Values 





CPU Time 


Wait Time 


Elapsed Time 


CPU 


Batch 


Sec/hr 


Sec./hr 


Sec/hr 


Uti 1 i zation 


FORTRAN 


34.11 


266.68 


300.79 


.113 


Other 


1 ,203.45 


5,825.80 


7,029.25 


.171 


FURPUR 


6.12 


461.79 


467.91 


.013 


MAP 


27.06 


213.84 


240.90 


.112 


PMD 


3.80 


5.69 


9.49 


.400 


COBOL 


36.09 


148.53 


184.62 


.195 




1,310. 


63 6,922. 


.33 8,232 


.96 .159 


Demand 


FORTRAN 


10.58 


343.70 


354.28 


.030 


Other 


127.83 


3,196.36 


3,324.19 


.038 


FURPUR 


5.82 


1 ,608.04 


1 ,613.86 


.004 


MAP 


14.27 


220.70 


234.96 


.061 


PMD 


.84 


5.44 


6.28 


.134 


COBOL 


26.35 


176.72 


203.07 


.130 



185.69 5,550.96 5,736.65 .032 



1,496.32 12,473.29 13,969.61 .120 



Table 14 



N_ 108 hours 

Mean Values 





CPU Time 


Wait Time 


Elapsed Tii 




Per Jen 


Per Job 


Per Job 


Batch 


(sec , 


(sec) 


Isec) . 


FORTRAN 


1.43 


11.19 


12.62 


Other 


72.02 


348.64 


420.66 


FURPUR 


.42 


32.05 


32.47 


MAP 


2.99 


23.60 


26.59 


PMD 


1.87 


2.80 


4.67 


COBOL 


16.33 


67.21 


83.54 


Demand 


FORTRAN 


i.6a 


51.84 


53.44 


Other 


8.83 


220.90 


229.73 


FURPUR 


.27 


75.57 


75.84 


MAP 


2.35 


36.30 


38.65 


PMD 


1.58 


10.27 


11.85 


COBOL 


8.67 


58.13 


66.80 
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REGRESSION ANALYSIS 



Previous Studies 

Reference [6] makes the point that a major difficulty in the evaluation 
of computer systems has been the inability to provide a valid quantification 
of the relationship between performance and workload. It is also mentioned 
that the problem of the presence of a large number of variables is compounded 
when the interactions of these variables must be taken into account. It is 
further stated that although the application of regression analyses to 
computer performance has not been extensive, it will become a major analysis 
tool. A regression equation can serve as a predictor of performance and 
resource usage, within the range of the variables which were used to estimate 
the regression coefficients. A disadvantage results from treating computer 
functions as black boxes, where the regression variables are black box inputs 
and outputs. This macro approach fails to deal with the internal structure 
of computer functions. The characteristics of the internal structures may be 
important determinants of computer performance. Also, it is possible to have 
a very good fit between dependent and independent variables without a cause 
and effect reason for the relationship. The goodness of the fit may mislead 
one to believe that the mathematical relationship implies a physical 
relationship. 

Regression analysis has been applied or studied for the evaluation of 
computer systems in several instances. In [3] CPU utilization was the depend- 
ent variable and number of instructions executed/number of bytes transferred 
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and CPU channel overlap were the independent variables which provided the best 
fit. Data were collected by hardware monitor on the IBM 360/85. Reference [4] 
reports several regression relationships: percent non-overlapped time between 

CPU and I/O versus CPU to I/O time ratio; channel/channel overlap versus 
number of initiators; unoverlapped disk seek time versus number of channels 
and number of initiators; and channel busy time versus supervisor CPU time and 
number of channel program executions. No information was provided concerning 
the application of these equations to actual computer systems. Two interesting 
efforts [5, 8] were concerned with estimating CPU overhead, average throughput 
and maximum throughput in a time-sharing environment (CP-67). The CPU time 
consumed by the operating system was related to various event counts, such as 
number of I/O instructions issued, number of interrupts and number of pages 
read. Reference [7] describes the use of regression analysis to estimate the 
effect on system reaction time of the number and size of core regions used 
for APL programs in an IBM 360/91 operating under OS/MVT. Reaction time is 
made a function of workload variables such as number of conversational inputs, 
log ons and large CPU requests per hour; CPU utilization; and number and size 
of core workspaces. According to this study, the workspace size was more 
important than number of workspaces as a determinant of system reaction time. 

An interesting aspect of this study was the use of regression to produce a 
cumulative distribution function of system reaction time. 

The above studies provide useful background for analyzing the subject 
of this paper. However, the thrust of the studies is the measurement and 
forecasting of a system (CPU overhead) performance variable as a function of 
various system or user activities. In addition, the studies make no distinc- 
tion among the various types of programs with regard to performance and 
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resource usage. This paper is concerned with the measurement and forecasting 
of user performance variables (elapsed time or wait time) as a function of 
resource usage and input load for each type of program in a multi programmed 
mode. 

Development of Regression Equations 

As a result of the correlation analysis, certain high correlation 
variables emerged as candidates for the regression equations. Usually one or 
two "independent variables" dominate in the sense of being the major contribu- 
tors to the explained variation about the mean due to regression (multiple 
correlation coefficient r 2 ). The following results of the correlation 
analysis provided guidelines for developing the regression equations: 

• Higher correlations were achieved within batch or demand program categories 
than between these categories. 

• Within the batch or demand program categories, higher correlations were 
achieved for variables within a program type than between program types. 

• Higher correlations were achieved for elapsed time than for wait time. 

• Number of jobs and CPU time were identified as the primary variables for 
predicting elapsed time by means of a regression equation. 

• Overall, the best single independent variable is number of jobs because 
its value would be known and would not have to be estimated. 

Only regression equations for the performance variable, elapsed time, 

were developed. This variable was emphasized because it proved to be the 

most difficult variable to forecast, since (1) the correlation coefficients 

were not high, and (2) elapsed time is a function of the variables of many 

program types. In most cases the resource usage regression equations will be 

easy to develop because (1) the correlation coefficients are high, and (2) 

resource usage is a function only of the variables of the given program type. 
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The following approaches to regression analysis were utilized: 

(1) Linear regression. This represented a subset of stepwise linear 
regression results, wherein the regression equation obtained up to 
a point in the stepwise routine was utilized. This approach was 
used to determine whether adequate forecasting accuracy could be 
obtained with only a few variables. 

(2) Stepwise linear regression using computer program BMD02R [9]. This 
approach was used to investigate the possibility of improving the 
forecast by using many variables. 

(3) Polynomial regression using computer program BMD05R [9], 

The last approach was used to determine whether a single variable equation which 
was non-linear in the independent variables could provide better accuracy 
than a linear equation with multiple variables. 

A summary of the regression equations which were developed appears in 
Table 15. The details of a subset of these equations (those with r > .7) 
appears in Appendix III. 

Several measures of the adequacy of the regression equations for fore- 
casting purposes were utilized as follows: 

• Sample multiple correlation coefficient r. 

• Coefficient of determination r 2 , the proportion of total variation 
about the mean of the dependent variable explained by the regression. 

• Sum of the squared residuals SS. 

• Residual mean square error of residuals SS/df, where df is the degrees 
of freedom. If the regression model is correct, this statistic provides 
an estimate of the error variance. 

• An examination of residuals (difference between observed and predicted 
values of elapsed time) in order to determine whether the following two 
assumptions of a regression model are satisfied: 

(1) whether the residuals are normally distributed, 

(2) whether the residuals have constant variance. 

• If r £ .7 is used as one of the criterions for an acceptable regression 
equation, only some of the program types have regression equations which 
are acceptable. 
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Table 15 

Sample Regression Equation Data 

Elapsed Time as a Function of Resource Usage 
and/or Number of Jobs 

N = 108 hours 
Batch Programs 



I.D. 


Program 


Reg 


Indep 

Var 


No of 
Indep 


r 


r 2 


SS 

x 10" 6 


(SS/df) 
x 10" 3 


F 




Type 


Type 


Var 












1 


FORTRAN 


L 


1 


1 


.709 


.503 


6.843 


64.55 


107.36 


2 


FORTRAN 


L 


3 


1 


.716 


.513 


6.710 


63.30 


111.58 


3 


FORTRAN 


S 


1,4 


7 


.731 


.534 


6.423 


64.23 


16.35 


4 


FORTRAN 


s 


1,2,3, 

4,5,6 


9 


.733 


.537 


6.377 


65.07 


12.63 


5 


FORTRAN 


P2 


1 


1 


— 


-- 


6.209 


59.13 


63.96 


6 


FORTRAN 


S 


1,4 


10 


.743 


.551 


6.180 


63.71 


11.92 


7 


FORTRAN 


S 


1,2,3, 

4,5,6 


14 


.754 


.569 


5.935 


63.82 


8.77 


8 


Other 


P2 


1 


1 


— 


-- 


1 ,739 


16,562 


26.34 


9 


Other 


P3 


1 


1 


— 


-- 


1,730 


16,634 


17.67 


10 


Other 


S 


1 ,4, 
5,6 


14 


.624 


.390 


1 ,594 


17,142 


4.24 


11 


Other 


S 


1,4 


11 


.633 


.401 


1,565 


16,301 


5.84 



I.D.: Regression equation identification used in Appendix III. Equations 

with r > .7 are shown in Appendix III. 

Regression Type: L = linear; PX = polynomial of degree x; S = stepwise. 

Independent Variable Type 

1. This program's input load (no. of jobs, core usage). 

2. This program's CPU time. 

3. Wait for this program’s I/O to complete. 

4. Other program's input load (no. of jobs, core usage). 

5. Wait for CPU to become available. 

6. Wait for I/O to become available. 

r: Sample coefficient of multiple correlation. 

SS: Sum of squared residuals, 

df: Degrees of freedom. 

SS/df: Mean square error of residuals. 

F: Ratio of sum of squared deviations due to regression to sum of squared 

residuals. 
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I.D. 

12 

13 

14 

15 

16 

* 

17 

18 

19 

20 



21 

22 

23 

24 



N 



Table 15 
(Conti nued) 

N = 108 hours 

Batch Programs 



Program 


Reg 

Type 


Indep 

Var 

Type 


No of 
Indep 
Var 


r 


r 2 


SS 

x 10' 6 


(SS/df) 
x 10" 3 


F 


FURPUR 


S 


1,4 


7 


.680 


.462 


13.796 


137.96 


12.26 


FURPUR 


S 


1,4 


12 


.683 


.466 


13.680 


142.50 


7.63 


MAP 


S 


1,4 


2 


.765 


.586 


3.144 


29.94 


74.21 


MAP 


S 


1,4 


7 


.778 


.605 


2.996 


29.96 


21.90 


MAP 


S 


1 ,4 


10 


.791 


.626 


2.842 


29.30 


16.20 


MAP 


S 


1,3 


3 


.831 


.690 


1 .742 


22.93 


56.51 


MAP 


S 


1,3 


3 


.876 


.767 


1.765 


16.97 


114.40 


PMD 


S 


1,4 


10 


.604 


.365 


.042 


.43 


5.58 


COBOL 


S 


1,4 


12 


.577 


.333 


12.407 


130.60 


3.95 



Demand Programs 



FORTRAN 


S 


1,4 


11 


.790 


.624 


15.240 


158.75 


14.44 


Other 


S 


1,4 


12 


.884 


.781 


364.046 


3,832 


28.16 


FURPUR 


s 


1,4 


12 


.849 


.721 


161 .119 


1 ,696 


20.48 


MAP 


s 


1,4 


11 


.789 


.622 


6.504 


67.750 


14.36 



80 
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In most cases the use of a large number of variables is not warranted, 
based on : 

(1) the small increases in r 2 achieved (Table 15) by adding variables, 
particularly variables from other program types, and 

(2) the fact that minimum SS/df does not necessarily occur when the 
largest number of independent variables is used (Table 15). 

Analysis of Residuals 

An analysis of residuals is shown in Table 16. The assumptions of the 
regression model are that the residuals are normally distributed with zero 
mean and constant variance. An approximate check of normality was made (Table 
16) by determining the percentage of residuals which were within one and two 
standard deviations of the mean. The data in Table 16 indicates non-normality 
of the residuals at one standard deviation. The lack of normality prevents 
the use of F tests for testing: 

(1) the hypothesis that all regression coefficients are zero or, equivalently, 
that the regression equation is no better for forecasting elapsed time 
than the mean value of elapsed time (using the F values in Table 15), and 

(2) the hypothesis that individual regression coefficients are zero by cal- 
culating F = (regression coefficient) 2 /(standard error of regression 
coefficient) 2 from the data in Appendix III. 

As indicated in Appendix III, in many instances the standard error is 
much greater than the regression coefficient, indicating that the associated 
variable should not be included in the regression equation. Fortunately, this 
is usually not the case for the primary (high correlation) variables. 

Computer plots were made of residuals versus the independent variables. 
These plots show that the residuals increase with increasing values of the inde- 
pendent variable of the given program type. Thus, the constant variance assump- 
tion is violated. This is caused by the increase in the variance of wait time 
(the major component of elapsed time) which occurs when a large number of jobs 
is resident in the system. The build up in queues causes an increase in the 
variability of wait time. The data in Table 16 also indicate that the regression 
equations consistently overestimate the actual values of elapsed time. 
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Table 16 



Analysis of Residuals 
N = 108 hours 



No. of 



:.d. 


Program 


Reg 

Type _ 


Indep. 

Var. 


S.D. 


% Residuals 
Bet ±1S.D. 


% Residuals 
Bet ±2S.D. 


% Negative 
Residuals 


3 


FORTRAN 

Batch 


S 


7 


253 


86 


94 


73 


4 


FORTRAN 

Batch 


S 


9 


255 


86 


94 


75 


5 


FORTRAN 

Batch 


P2 


1 


243 


84 


94 


75 


6 


FORTRAN 

Batch 


S 


10 


252 


85 


94 


74 


7 


FORTRAN 

Batch 


S 


14 


253 


85 


95 


73 


16 


MAP 

Batch 


S 


10 


171 


84 


96 


61 


21 


FORTRAN 

Demand 


S 


11 


398 


80 


94 


63 


22 


Other 

Demand 


S 


12 


1958 


76 


97 


58 


23 


FURPUR 

Demand 


s 


12 


1302 


81 


95 


56 


24 


MAP 


s 


11 


259 


93 


97 


51 



Demand 



I.D.: Regression equation identification used in Table 15. 

Regression type: S = stepwise; P2 = polynomial of degree 2. 

S.D. = standard deviation of residuals. 
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In order to produce more accurate forecasts of 




one of the 



following approaches for obtaining the regression equations could be employed: 



(1) Use a linear weighted least square model for the purpose of achieving 
constant variance residuals. 



(2) Use a model which is non-linear in the parameters in order to provide a 
better fit to the data (reduce the large number of negative residuals). 

(3) Decrease the variability of elapsed time by partioning the observations 
into 24 sets, where each set is all the observations in a given one hour 
period over all the days which constitute the sample. The variability 
of observations should be reduced by using this method since the input 
to the system varies considerably by hour within each day, but the daily 
pattern of usage is consistent. However, the amount of computation is 
expanded significantly, being multiplied by a factor of 24. This pro- 
cedure would correspond to using the linear programming formulation of (26) 
through (31). 



(4) Since it was observed from Table 9 that CPU time A-^. can usually be 

forecasted more accurately than elapsed time T^, a CPU time regression 

equation A,. = h-(x.) could be developed and used to forecast T. from the 
I J /\ J J ^ J 

relationship T. = h.(x.)/U., where the CPU utilization for program type 

j, U-, is estimated from U . = )A, ./)J . over the jobs which constitute 
J J * J J 



the historical sample for program type j. 

After obtaining T., by one of the above methods, and 

J 






it would 



be of interest to forecast U. from the relationship U. = A-../T.. 

J J ' J J 



SUMMARY 



Research Efforts 

Three major efforts have been discussed. The first effort dealt with 
the development of a mathematical programming model for determining the optimal 
mix of jobs to run in a computer center. The purpose of the second effort, 
correlation analysis, was to estimate the degree of linear association among vari- 
ables. This was done in order to identify, for a multiprogramming operation, 
the high correlation variables and to confirm or reject intuitive notions 
regarding the linear association of variables. The identification of high 
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correlation variables led to the use of these variables in the third effort, 
regression analysis. 

Applications 

There are two applications of the regression analysis. One application 
involves using the regression equations in the formulation of the linear 
program. In this instance, the linear program objective function and con- 
straints are not deterministic with constant coefficients. Rather, the objec- 
tive function and constraints are functions of random variables. These func- 
tions are derived statistically in the form of regression equations, where 
the estimates of the dependent variables (performance or resource usage variables) 
are least squares estimates of the objective and constraints functions. The 
regression equation coefficients are the coefficients of the objective func- 
tion and constraints. These coefficients are not constants as they would be 
in the ordinary linear programming formulation. Rather, these coefficients 
are least square estimates of random variables. 

The second application of the regression equations is to make various 
forecasts and estimates as indicated below. 

• Forecast the effect on performance of various resource usages and job 
mi xes . 

• Forecast resource usage as a function of input load Xj . 

• Estimate the input load which would cause resource usage to equal or exceed 
resource capacity (saturation). 

• Forecast the input load and resource usage which would cause unacceptable 
performance. 

Once the linear program is available, the following types of analyses 
could be made: 

• Compute the optimal job mix. If it is feasible to implement this job mix, 
improved computer performance should be possible. 



• Attempt to validate the model by comparing the actual total elapsed time 
of job mixes which approximate the optimal job mix with the actual total 
elapsed time of non-optimal job mixes. 

• Compare the actual job mix with the optimal job mix and compare actual total 
elapsed time with optimal total elapsed time. If the actual time exceeds 
the optimal time by a large amount, use the linear program to estimate the 
effect on performance and job mix of hardware changes (resource capacities), 
budget changes and differential pricing. In order to ascertain the effect 
of pricing which is based both on type of resource and time of use, it 
would be necessary to use the model given by (26) through (31). 

• Determine the effect on optimal job mix and performance of changes in 
user production requirements. 

Although the effect on performance of changes in the operating system 
cannot be evaluated by using the linear program, it may be possible to use 
the optimal job mix as the objective to be achieved by a change in operating 
system job scheduling and task dispatching priorities. 

One of the benefits of the above analyses is to estimate the effect of 
a change in operation prior to the implementation of the change. For example, 
in view of the expense of making hardware changes, it would be advantageous 
to use the model as a means of estimating future performance and of justifying 
expenditures when equipment requests are made to the Navy's Automatic Data 
Processing Equipment Selection Office. In addition, since the Government 
Accounting Office has made CPU utilization a primary means of evaluating the 
effectiveness of Federal computer centers, the model could be used to estimate 
the effect of various changes on CPU time and elapsed time and, hence, on CPU 
utilization. In addition, the CPU times and elapsed times which correspond 
to the minimum CPU utilization constraints in the linear program, would be 
obtained from a solution of the linear program. Furthermore, when utilization 
constraints are not specified, optimal values of CPU utilization can be estimated 
from the optimal values of CPU time and elapsed time obtained from solving the 
linear program. 
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It would be important to reformulate the regression equations and 
linear program, by using new sample data, when significant changes in the 
computer operation occur, such as a change in hardware configuration. 

Future Research Efforts 

In order to complete this model, additional work must be done to obtain 
more accurate regression equations for the performance variables. The tech- 
niques for achieving this result include (1) linear weighted least squares, 

(2) non-linear regression, (3) a reduction in variability by using more 
homogeneous sets of data, and (4) estimation of elapsed time from the CPU 
time regression equation and CPU utilization. If non-linear regression equa- 
tions are used, they must be separable functions in order to make the numerical 
solution of the mathematical program feasible. 

In addition to providing more accurate performance regression equations, 
it will be necessary to develop regression equations for the resource usage 
variables. 
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APPENDIX I 

Sample Data and Statistics 
N = 108 hours 



A. Number of Jobs in Sample 









Jobs 


per Hour 


Batch 


Total ,, * 
Jobs u; 


Percent 


Mean 


Standard 

Deviation 


FORTRAN 


2,574 


09.8) 


23.82 


26.94 


Other 


1 ,805 


(13.9) 


16.71 


12.94 


FURPUR 


1 ,556 


(12.0) 


14.41 


14.63 


MAP 


979 


(7.5) 


9.06 


8.32 


PMD 


219 


(1.7) 


2.03 


1.97 


COBOL 


239 


(1.8) 


2.21 


2.95 




7,372 


(56.7) 






Demand 










FORTRAN 


716 


(5.5) 


6.63 


8.88 


Other 


1 ,563 


(12.0) 


14.47 


16.98 


FURPUR 


2,298 


(17.7) 


21.28 


22.95 


MAP 


657 


(5.1) 


6.08 


7.95 


PMD 


57 


(.5) 


.53 


.92 


COBOL 


329 


(2.5) 


3.04 


4.83 




5,620 


(43.3) 







Total batch 

and demand 12,992 



(1) Total number of jobs executed during five day period for each 
program. The sample data were collected during the following 
dates of operation in 1973: 

July 6-7 
July 10-11 
July 17-18 
July 20-21 
July 27-28 
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APPENDIX I (continued) 
N = 108 hours 



B. CPU Time in Sample 

Total 

Time 



( 1 ) 



Secs per Hour 



Secs per Job 



Standard 



Batch 


(secs) 


Percent 


Mean 


Deviation 


Mean 


FORTRAN 


3,684 


(2.3) 


34.11 


38.82 


1.43 


Other 


129,972 


(80.4) 


1,203.45 


787.20 


72.01 


FURPUR 


661 


(.5) 


6.12 


7.59 


1 .09 


MAP 


2,922 


(1.8) 


27.06 


24.43 


2.98 


PMD 


410 


(.2) 


3.80 


7.78 


1 .87 


COBOL 


3,898 


(2.4) 


36.09 


50.38 


16.31 




141,547 


(87.6%) 








Demand 












FORTRAN 


1,142 


(.7) 


10.58 


18.21 


1.59 


Other 


13,805 


(8.5) 


127.83 


176.69 


8.83 


FURPUR 


628 


(.4) 


5.82 


8.79 


.27 


MAP 


1,541 


(-9) 


14.27 


17.76 


2.35 


PMD 


91 


(.1) 


.84 


4.01 


1.60 


COBOL 


2,846 


(1.8) 


26.35 


40.65 


8.66 




20,053 


(12.4) 








Total batch 
and demand 


161,600 










(1) Total 


CPU time used 


during five 


day period by each program. 
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APPENDIX I (continued) 
N = 108 hours 



C. Core Usage in Sample 
Total 



Kilo-words per Hour 



Kilo-words 
per Job 



Batch 


m iu m 

Words ' ' 


Percent 


Mean 


owiiudra 

Deviation 


Mean 


FORTRAN 


110,690 


(30.7) 


1 ,024.91 


1,158.67 


43.00 (2) 


Other 


59,855 


(16.6) 


554.22 


460.79 


33.16 


FURPUR 


19,298 


(5.4) 


178.69 


181.51 


12. 40^ 


MAP 


25,687 


(7.1) 


237.84 


217.90 


26.24^ 


PMD 


1 ,656 


(.5) 


15.33 


14.89 


7.56^ 2) 


COBOL 


8,987 


(2-5) 


83.21 


110.70 


37.60^ 




226,173 


(62.8) 








Demand 












FORTRAN 


30,699 


(8.5) 


284.25 


381 . 34 


43.88 (2) 


Other 


44.787 


(12.5) 


414.70 


511.88 


28.65 


FURPUR 


28,204 


(7.9) 


261.14 


281 .74 


12. 27^ 


MAP 


17,253 


(4.8) 


159.75 


208.89 


26. 26 2 ^ 


PMD 


427 


(-1) 


3.96 


6.96 


7.49^ 


COBOL 


12,351 


(3.4) 


114.36 


181 .46 


37 . 54 ^ 2 ^ 




133,721 


(37.2) 








Total batch 










and demand 


359,894 










(1) Total 


core usage during five day period by each 


program. 




(2) These 


programs have approximately 


constant core 


usage per job and 


usage 


is the same for 


batch and demand. 
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APPENDIX I (continued) 
N = 108 hours 



D. Input/Output References in Sample 

Refs per Hour Refs per Job 



Batch 


Total M , 
Refs ' ' 


Percent 


Mean 


Standard 

Deviation 


Mean 


FORTRAN 


169,167 


(2.1) 


1,566.36 


1 ,798.23 


65.76 


Other 


5,091 ,488 


(64.2) 


47,143.41 


42,091.38 


2,821.27 


FURPUR 


529,459 


(6.7) 


4,902.39 


6,668.04 


340.21 


MAP 


383.433 


(4.9) 


3,550.31 


3,083.04 


391.87 


PMD 


11,845 


(.1) 


109.68 


203.14 


54.03 


COBOL 


247,523 

6,432,915 


(3.1) 

(81.1%) 


2,291.88 


3,526.57 


1 ,037.05 


Demand 


FORTRAN 


61,112 


(.8) 


565.85 


815.77 


85.35 


Other 


729,423 


(9.2) 


6,753.91 


10,802.05 


466.75 


FURPUR 


272,274 


(3.4) 


2,521 ,06 


4,309.85 


118.47 


MAP 


235,269 


(3.0) 


2,178.42 


2,691.00 


358.29 


PMD 


4,482 


(.1) 


41.50 


171.24 


78.30 


COBOL 


193,264 

1,495,824 


(2.4) 

(18.9%) 


1,789.48 


2,726.97 


588.64 



Total batch 

and demand 7,928,739 



(1) Total number of I/O references made during five day period by each program. 
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APPENDIX I (continued) 
N = 108 hours 



E. Input/Output Transfers in Sample 

Kilo-Words per Hour 

Total /,\ Standard 

Batch Kilo-Words^ ' Percent Mean Deviation 



FORTRAN 


183,511 


(4.6) 


1,699.18 


1,956.51 


71.33 


Other 


2,450,657 


(61.4) 


22,691.27 


32,668.43 


1 ,357.95 


FURPUR 


310,467 


(7.8) 


2,874.70 


3,838.91 


199.49 


MAP 


313,617 


(3.3) 


1,218.67 


1 ,081.69 


134.51 


PMD 


3,552 


(.1) 


32.89 


58.09 


16.20 


COBOL 


127,758 


-(jjJD- 


1 ,182.94 


1 ,813.47 


535.27 




3,207,562 


(80.4) 








Demand 












FORTRAN 


61,370 


(1.5) 


568.24 


782.98 


85.71 


Other 


321,211 


(8.0) 


2,974.18 


5,048.76 


205.54 


FURPUR 


212,046 


(5.3) 


1,963.39 


2,899.68 


92.26 


MAP 


88,077 


(2.2) 


815.53 


1,018.94 


134.13 


PMD 


1,212 


(.1) 


11.23 


44.39 


21.19 


COBOL 


98,958 


(2.5) 


916.28 


1 ,398.79 


301.41 




782,874 


(19.6) 








Total batch 
and demand 


3,990,436 











(1) Total words transferred during five day period by each program. 



Ki lo-Words 
per Job 

Mean 



51 



APPENDIX II 

Variable Identification 



Program Type j 



Program 


Batch 


Demand 


FORTRAN 


1 


7 


Other 


2 


8 


FURPUR 


3 


9 


MAP 


4 


10 


PMD 


5 


11 


COBOL 


6 


12 



Resource i 






CPU Time 


1 


Seconds per hour 


Core Usage 


2 


Kilowords per hour 


I/O References 


3 


References per hour 


I/O Words Transferred 


4 


Kilowords per hour 



Elapsed Time Variables 

Total elapsed time in seconds per program type j per hr. = T. 

J 

Elapsed time in seconds per program type j per hr. per job = t. 

J 

Resource Usage Variables 

Total use of resource i by program type j per hr. = A. . 

1 J 

Use of resource i by program type j per hr. per job = a. . 

^ J 

Job Variables 

Number of jobs of program type j per hr. = x.. 

J 



52 



APPENDIX III 

Sample Regression Equations 
N = 108 hours 



Independent Variables 



D. 


Dep 

Var r 


Var 


-Jype 


p wi th 
Dep Var 


p wi th 
Own Jobs 


Reg 

Coeff 


Std Err 
Reg Coeff 


Inter 

cept 


1 


T 1 . 709 


X 1 


1 


.709 


1.0 


9.448 


.912 


75.62 


2 


T 1 .716 


A 31 


3 


.716 


.991 


.143 


.014 


76.99 


3 


T 1 .731 


X 1 


1 


.709 


1.0 


7.356 


1.596 


70.77 






X 2 


4 


.519 


1.0 


10.890 


7.677 








X 3 


4 


.368 


1.0 


4.670 


3.267 








X 4 


4 


.632 


1.0 


.757 


2.344 








X 5 


4 


.069 


1.0 


-9.243 


13.596 








X 6 


4 


.227 


1.0 


-5.516 


10.468 








X 7 


4 


.400 


1.0 


-3.286 


3.815 





I.D.: Regression identification referred to in Table 15. Only equations with 

r > .7 are shown here. 

r: Sample coefficient of multiple correlation, 
p: Sample coefficient of correlation. 

Independent Variable Type 

1. This program's input load (no. of jobs, core usage). 

2. This program's CPU time. 

3. Wait for this program's I/O to complete. 

4. Other programs' input load (no. of jobs, core usage). 

5. Wait for CPU to become available. 

6. Wait for I/O to become available. 
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I.D. 

4 



5 

6 



7 



Dep 

Var 



r 

.733 



.743 



.754 



APPENDIX III (continued) 



Independent Variables 



Var 


Type 


p with 
Dep Var 


p with 
Own Jobs 


Reg 

Coeff 


Std Err 
Reg Coeff 


Inter 

cept 


A 11 


2 


.707 


.962 


-.759 


3.802 


49.76 


A 21 


1 


.709 


1.0 


.083 


.256 




*3, 


3 


.716 


.991 


.268 


.206 






3 


.710 


.996 


-.178 


.217 




x 4 


4 


.632 


1.0 


10.579 


32.817 




A 14 


5 


.570 


.905 


-4.948 


4.331 




A 24 


4 


.622 


.995 


-.412 


1.299 . 




^34 


6 


.601 


.926 


-.011 


.052 




A 44 


6 


.619 


.959 


.206 


.235 




X 1 


1 


.709 


1 .0 


16.038 


2.194 


13.98 


x l2 


1 


-- 


-- 


-.074 


.023 




X 1 


1 


.709 


1.0 


6.224 


1.691 


48.75 


x 2 


4 


.519 


1.0 


-3.412 


3.955 




x 4 


4 


.632 


1.0 


13.434 


7.829 




x 5 


4 


.069 


1.0 


-2.198 


13.864 




x 6 


4 


.227 


1.0 


-6.925 


10.488 




X 8 


4 


.339 


1.0 


-1.034 


3.799 




x 9 


4 


.486 


1.0 


2.379 


3.241 




X 10 


4 


.425 


1.0 


7.467 


8.340 




X 11 


4 


.177 


1.0 


21.727 


28.740 




X 1 2 


4 


.212 


1.0 


-11.115 


9.112 




A 11 


2 


.707 


.962 


.925 


4.056 


65.70 


A 21 


1 


.709 


1.0 


.074 


.266 




A 3, 


3 


.716 


.991 


.278 


.208 




A 41 


3 


.710 


.996 


-.207 


.223 




x 2 


4 


.519 


1.0 


-8.705 


7.574 




A 22 


4 


.454 


.937 


.059 


.173 




X 4 


4 


.632 


1.0 


3.060 


35.777 




A 14 


5 


.570 


.905 


-5.521 


4.688 




A 24 


4 


.622 


.995 


.175 


1.390 
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APPENDIX III (continued) 
Independent Variables 



:.d. 


Dep 

Var 


r 


Var 


Type 


P wi th 
Dep Var 


P wi th 
Own Jobs 


Reg 

Coeff 


Std Err 
Reg Coeff 


Inter- 

cept 








A 44 


6 


.619 


.959 


.154 


.164 










x 8 


4 


.449 


1.0 


-6.320 


6.136 










A 28 


4 


.438 


.949 


.240 


.175 










A 29 


4 


.486 


1.0 


.075 


.265 










X 10 


4 


.425 


1.0 


3.590 


7.891 




14 


T 4 


.765 


x 4 


1 


.743 


1.0 


19.875 


2.403 


16.44 






x 6 


4 


.562 


1.0 


20.019 


6.785 




15 


T 4 


.778 


X 1 


4 


.541 


1.0 


-.528 


1 .090 


-11.38 






x 2 


4 


.679 


1.0 


2.750 


2.605 










x 3 


4 


.511 


1.0 


.512 


1.601 










X 4 


1 


.743 


1.0 


15.251 


5.243 










x 5 


4 


.206 


1.0 


4.508 


9.285 










x 6 


4 


.562 


1.0 


18.801 


7.149 










X 7 


4 


.452 


1.0 


3.402 


2.231 




16 


T 4 


.791 


X 2 


4 


.679 


1.0 


2.470 


2.686 


-17.58 






X 4 


1 


.743 


1.0 


15.225 


4.321 










x 5 


4 


.206 


1.0 


3.156 


9.383 










x 6 


4 


.562 


1.0 


19.120 


6.726 










X 7 


4 


.452 


1 .0 


3.298 


3.375 










x 8 


4 


.482 


1 .0 


-.816 


2.632 










X 9 


4 


.483 


1.0 


-1.928 


2.315 










X 1 0 


4 


.455 


1.0 


2.259 


5.631 










X 11 


4 


.212 


1.0 


27.371 


19.599 










X 1 2 


4 


.402 


1.0 


10.503 


5.922 




1 7 a 


T 4 


.831 


x 4 


1 


.743 


1 .0 


-32.931 


7.003 


21.59 






A 34 


3 


.799 


.926 


-.124 


.034 










A 44 


3 


.830 


.959 


.789 


.124 




18 b 


T 4 


.876 


X 4 


1 


.743 


1.0 


-31 .945 


5.867 


9.00 






A 34 


3 


.799 


.926 


-.117 


.027 










A 44 


3 


.830 


.959 


.768 


.101 




9 N - 


■ 80 


















b N = 


■ 108 
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I.D. 

21 



22 



23 



Dep 

Var 



r 

.790 



.884 



.849 



APPENDIX III (continued) 



Independent Variables 



Var 


Type 


P wi th 
Dep Var 


P with 
Own Jobs 


Reg 

Coeff 


Std Err 
Reg Coeff 


X 1 


4 


.218 


1.0 


-8.916 


2.588 


x 2 


4 


.355 


1.0 


-8.379 


6.350 


x 3 


4 


.229 


1.0 


1.224 


3.809 


x 4 


4 


.405 


1.0 


35.546 


11.916 


x 5 


4 


-.0182 


1.0 


-6.944 


22.424 


x 7 


1 


.732 


1.0 


42.200 


7.978 


x 8 


4 


.614 


1.0 


6.695 


6.174 


x 9 


4 


.595 


1 .0 


-7.331 


5.417 


x 10 


4 


.606 


1.0 


36.727 


13.248 


X U 


4 


.041 


1.0 


-30.695 


46.875 


x 12 


4 


.404 


1.0 


-35.377 


14.620 


X 1 


4 


.441 


1.0 


4.479 


13.457 


x 2 


4 


.447 


1.0 


-25.097 


31.201 


X 3 


4 


.263 


1.0 


4.526 


18.721 


x 4 


4 


.460 


1.0 


45.363 


63.307 


x 5 


4 


-.068 


1.0 


-189.611 


110.226 


x 6 


4 


.207 


1.0 


-70.285 


81.423 


x 7 


4 


.709 


1.0 


24.131 


38.225 


x 8 


1 


.850 


1.0 


143.376 


30.410 


x 9 


4 


.816 


1.0 


-8.997 


26.614 


X 10 


4 


.800 


1.0 


89.684 


65.092 


X 11 


4 


.313 


1.0 


751.111 


230.484 


x 12 


4 


.626 


1.0 


49.987 


72.032 


X 1 


4 


.399 


1.0 


-8.781 


8.952 


x 2 


4 


.369 


1.0 


-26.495 


20.757 


x 3 


4 


.191 


1 .0 


-12.489 


12.454 


x 4 


4 


.474 


1.0 


76.253 


42.781 


x 5 


4 


-.007 


1.0 


103.769 


73.329 


x 6 


4 


.214 


1.0 


-56.268 


54.168 


x 7 


4 


.714 


1.0 


38.284 


26.095 


X 8 


4 


.710 


1.0 


-19.235 


20.231 



Inter- 

cept 

60.79 



562.98 



-146.98 
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I.D. 



24 



Dep 

Var r 




.789 



APPENDIX III (continued) 



Independent Variables 



Var 


Type 


P wi th 
Dep Var 


P wi th 
Own Jobs 


Ref 
Coe ff 


Std Err 
Reg Coeff 


x 9 


1 


.827 


1.0 


100.251 


17.705 


x 10 


4 


.706 


1.0 


-17.431 


43.304 


x n 


4 


.151 


1.0 


-55.856 


153.333 


X 1 2 


4 


.550 


1.0 


-51.721 


47.921 


X 1 


4 


.272 


1.0 


-2.954 


1.788 


x 3 


4 


.228 


1.0 


1.472 


2.451 


x 4 


4 


.373 


1.0 


6.697 


7.405 


x 5 


4 


-.017 


1.0 


-3.233 


14.441 


x 6 


4 


.160 


1.0 


-11.309 


10.826 


x 7 


4 


.556 


1.0 


-6.103 


5.202 


x 8 


4 


.647 


1.0 


-4.913 


3.887 


x 9 


4 


.738 


1.0 


11 .165 


3.436 


x 10 


1 


.748 


1 .0 


25.473 


8.633 


X 1 1 


4 


.080 


1.0 


-35.915 


30.483 


X 1 2 


4 


.633 


1.0 


-1.075 


9.570 



Inter- 

cept 



-3.65 
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APPENDIX IV 



Naval Weapons Center Computer Center 

The NWC Computer Center performs both batch and demand (interactive) 
processing on an open shop basis for a variety of scientific and management 
applications. The Center has approximately 700 users in total, serves 300 to 
400 users each week and processes 4,000 to 5,000 jobs per week on a five day 
per week, three shift basis [10]. About four jobs are usually processed 
concurrently. 

The data which were used in the correlation and regression analysis were 
collected from the Center's UNIVAC 1108 system. This system operated under the 
control of EXEC 8, level 28. The data were collected during a five day period 
in July 1973. A description of the sample appears in Appendix I. Since this 
time, the Center has upgraded the system to a UNIVAC 1110. 

Description of Hardware [11, 12] 

The major components of the hardware system, corresponding to the time 
period of the sample data , will now be described. 

1 . Central Processing Unit (CPU ) 

There is one CPU with 128 control registers and a 125 nanosecond access 
time. There are 155 instructions, many of which execute in 750 nanoseconds. 
The CPU uses 11 input/output channels. 

2. Main Memory 

There are three modules of 750 nanosecond cycle time core memory. Each 
module consists of two banks of 32,768 thirty-six bit words for a total of 
196,608 words. Two modules (4 banks) or 131, 072 words are available for 
user programs. A single user program is normally limited to 65,000 words. 
One module is reserved for the storage of the resident portion of EXEC 8. 
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3 . Auxiliary Storage 



FH-432 Drum Units 

• There are six flying head drum units which are used for the storage of 
non-resident portions of EXEC 8 and frequently used system processors, 
such as FORTRAN. These drums have a storage capacity of 262,144 thirty- 
six bit words, or 1,572,864 words in total, an average access time of 
4.25 milliseconds and a transfer rate of 240,000 words per second. 

FH-1782 Drum Unit 

• This unit is used for the storage of catalogues, user and production 
programs and data. Drum storage capacity is 2,097,152 thirty-six 
bit words, with an average access time of 17 milliseconds and a trans- 
fer rate of 240,000 words per second. 

AMPEX Disc Units 

• These disc units emulate the UNI VAC Fastrand drum format but operate 
at a higher speed than the UNIVAC units. These units are used for the 
storage of programs and data which are referenced and created during 
job processing. There are eighteen disc units, with each unit capable 
of storing 7,340,032, thirty-six bit words, or a total of 132,120,576 
words. The average access time is 30 milliseconds and the transfer 
rate is 52,000 words per second. 

The variables "1/0 References" and "1/0 Words Transferred," which were 
used in the correlation and regression analysis, pertain to all of the 
above auxiliary storage units. 

UNI SERVO Tape Units 

• These tape units provide permanent storage for user programs and data. 
There are thirteen 7 track drives with 200/556/800 bpi and one 9 track 
with 800/1600 bpi. Both tape units operate at 120 in. /sec. 



4. UNIVAC 9300 Computers and Data Communications Terminals 



One UNIVAC 9400 at the central site and two remotely located computers serve 
as I/O interfaces for card readers, card punches and high speed printers. 
There are also three remotely located Data Communication Terminals for 
interfacing card readers, card punches and printers. 

5. Communications Subsystem 

For demand processing, a communications subsystem is provided which con- 
sists of a Communications Terminal Module Controller for multiplexing 
terminal communication lines, and a variety of terminals (approximately 
64 in number), including UNISCOPE 300, teletype, crt and graphics terminals. 
Description of Programs [12, 13 ] 

The following is a brief description of the types of programs which were 
used in the correlation and regression analysis. Six types of programs were 
analyzed. The same types were analyzed for both batch and demand processing. 
These programs constitute approximately two-thirds of the total computing load 
of the NWC Computer Center. 

1 . FORTRAN 

A language processor which compiles FORTRAN V source language statements 
into relocatable binary code. This processor can also be used to store 
or update source statements. 

2. OTHER 

This is the name given to scientific and management production programs. 

These are executable programs which have been previously compiled or 
assembled by one of the language processors. 

3. FURPUR 

This is a set of file utility routines which is used to perform file copying, 
deletion, listing, positioning, marking, name changing, rewinding, closing 
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and punching. File operations involve the use of magnetic tape and mass 
storage units. Since a number of file options are available to the user, 
the execution characteristics of FURPUR are not constant but vary consid- 
erably as a function of the operation to be performed. 

4. MAP 

A system processor for collecting and combining relocatable program elements 
which have been generated by a language processor into executable (absolute) 
program elements. The absolute elements are structured so that the loader 
can place the absolute elements in execution. It is possible to save and 
reexecute the absolute elements many times. Recollection is only required 
when the relocatable elements have been modified. 

5. PMD (Postmortem Dump Processor ) 

At program termination, the PMD writes the final contents of a program's 
main storage area into the diagnostic file and then edits and prints the 
data. 

6. COBOL 

A language processor which compiles COBOL source language statements into 
relocatable binary code. This processor can also be used to store or 
update source statements. 

The above programs were executed under the control of the EXEC 8 operating 
system which, in total, requires two million words of storage. 
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